# End-to-End Training

Coco Instance Eomt Large 1280
MIT
This paper proposes a method to reinterpret Vision Transformer (ViT) as an image segmentation model, demonstrating ViT's potential in image segmentation tasks.
Image Segmentation
C
tue-mps
105
0
Ade20k Panoptic Eomt Giant 1280
MIT
This paper proposes a method to reinterpret Vision Transformer (ViT) as an image segmentation model, revealing ViT's potential in image segmentation tasks.
Image Segmentation
A
tue-mps
96
0
Ade20k Panoptic Eomt Large 1280
MIT
This paper proposes an image segmentation model based on Vision Transformer (ViT), revealing the potential of ViT in image segmentation tasks.
Image Segmentation PyTorch
A
tue-mps
129
0
Coco Panoptic Eomt Large 1280
MIT
This paper proposes a novel perspective by treating Vision Transformer (ViT) as an image segmentation model and explores its potential in image segmentation tasks.
Image Segmentation
C
tue-mps
119
0
Ade20k Semantic Eomt Large 512
MIT
This model is developed based on the paper 'Your ViT is Actually an Image Segmentation Model' and is a Vision Transformer model for image segmentation tasks.
Image Segmentation
A
tue-mps
108
0
Coco Panoptic Eomt Large 640
MIT
This model reveals the potential of Vision Transformer (ViT) in image segmentation tasks by adapting its architecture for segmentation purposes.
Image Segmentation
C
tue-mps
217
0
Coco Instance Eomt Large 640
MIT
This paper proposes a method to reinterpret Vision Transformer (ViT) as an image segmentation model, demonstrating ViT's potential in image segmentation tasks.
Image Segmentation
C
tue-mps
99
0
Coco Panoptic Eomt Giant 1280
MIT
By rethinking the architecture of Vision Transformer (ViT), this model demonstrates its potential in image segmentation tasks.
Image Segmentation PyTorch
C
tue-mps
90
0
Detr Finetuned Chess
Apache-2.0
This is an object detection model based on the DETR architecture, specifically fine-tuned for chess piece recognition tasks.
Object Detection Transformers
D
aesat
29
1
Yolov10x
YOLOv10x is the latest version of the YOLO series, focusing on real-time end-to-end object detection, offering higher detection accuracy and faster inference speed.
Object Detection
Y
jameslahm
1,145
41
Yolov10l
YOLOv10 is a real-time end-to-end object detection model developed by the Tsinghua University team, based on the latest improved version of the YOLO series.
Object Detection
Y
jameslahm
186
3
Yolov10b
YOLOv10 is a real-time end-to-end object detection model developed by the Tsinghua University team, representing the latest improvement in the YOLO series.
Object Detection
Y
jameslahm
97
2
Yolov10n
YOLOv10 is a real-time end-to-end object detection model proposed by Tsinghua University, known for its efficiency and accuracy.
Object Detection Safetensors
Y
jameslahm
3,326
17
Control V11p Sd15 Inpaint
Openrail
ControlNet v1.1 is a neural network architecture based on diffusion models, designed to control image generation through additional conditions, particularly suited for image inpainting tasks.
Image Generation Other
C
krnl
35
0
Mamba 3B Slimpj
Apache-2.0
A 3B-parameter language model based on the Mamba architecture, supporting English text generation tasks.
Large Language Model Transformers English
M
Q-bert
56
3
Detr Resnet 50 Finetuned Cppe5
Apache-2.0
DETR object detection model fine-tuned on an image folder dataset, based on facebook/detr-resnet-50
Object Detection Transformers
D
tree12344
20
0
Timesformer Bert Video Captioning
A video caption generation model based on Timesformer and BERT architectures, capable of generating descriptive captions for video content.
Video-to-Text Transformers
T
AlexZigma
83
3
Encodec 48khz
MIT
EnCodec is a real-time high-fidelity neural audio codec developed by Meta AI, supporting multiple bandwidth configurations and streaming processing.
Audio Generation Transformers
E
facebook
23.25k
32
Donut Invoices
Invoice information extraction model fine-tuned based on Donut architecture, enabling OCR-free document understanding
Image-to-Text Transformers
D
scharnot
70
2
Detr Resnet 50 Finetuned OCR
Apache-2.0
An OCR model fine-tuned from facebook/detr-resnet-50 for object detection tasks
Text Recognition Transformers
D
ismadoukkali
15
1
Deformable Detr Box Supervised
Apache-2.0
Deformable DETR is an object detection model based on Transformer architecture, trained on the LVIS dataset, supporting detection of 1203 object categories.
Object Detection Transformers
D
facebook
193
0
Re2g Qry Encoder Fever
Apache-2.0
Re2G is a generative model combining neural initial retrieval and reranking for knowledge-intensive tasks. This question encoder is a component of the Re2G system, used to encode questions into vectors for retrieval.
Text Embedding Transformers
R
ibm-research
17
0
Re2g Qry Encoder Nq
Apache-2.0
Re2G is an end-to-end system combining neural retrieval, reranking, and generation for knowledge-intensive tasks. This model serves as its Natural Questions (NQ) question encoder component.
Question Answering System Transformers
R
ibm-research
14
0
Cifar 10 Vgg Pretrained
Image classification model implemented with PyTorch, capable of recognizing multiple common object categories
Image Classification Transformers
C
amehta633
22
0
Wav2vec2 Base Timit Demo Colab
Apache-2.0
A speech recognition model fine-tuned on the TIMIT dataset based on facebook/wav2vec2-base, for demonstration purposes
Speech Recognition Transformers
W
moaiz237
24
0
Gunnarthor Talromur A Fastspeech2
A FastSpeech2 text-to-speech model trained on the ESPnet framework and talromur dataset, supporting Icelandic speech synthesis.
Speech Synthesis English
G
espnet
50
0
Vilt B32 Finetuned Vqa
Apache-2.0
ViLT is a vision-and-language transformer model fine-tuned on the VQAv2 dataset for visual question answering tasks.
Text-to-Image Transformers
V
dandelin
71.41k
408
Wav2vec2 Gpt2 Wandb Grid Search
Automatic Speech Recognition (ASR) model trained on the LibriSpeech dataset
Speech Recognition Transformers
W
sanchit-gandhi
13
0
Wav2vec2 Large Xlsr Arabic Common Voice 10 Epochs
Arabic speech recognition model based on wav2vec2 architecture, trained for 10 epochs on the Common Voice dataset
Speech Recognition Transformers
W
salti
30
0
Summarization
A hybrid summarization generation model based on hierarchical reinforcement learning, combining the advantages of extractive and abstractive summarization to enhance information richness and readability
Text Generation Transformers
S
LiqiangXiao
18
4
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase